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Abstract — l Truly autonomous systems require the ability to 
monitor and adapt their internal body scheme throughout their 
entire lifetime. In this paper, we present an approach allowing 
a robot to learn from scratch and maintain a generative model 
of its own physical body through self-observation with a single 
monocular camera. We represent the robot's internal model by 
a compact Bayesian network consisting of local models that 
describe the physical relationships between neighboring body 
parts. We introduce a flexible Bayesian framework that allows to 
simultaneously select the maximum-likely network structure and 
to learn the underlying conditional density functions. Changes in 
the robot's physiology can be detected by identifying mismatches 
between model predictions and the self-perception. To quickly 
adapt the model to changed situations, we developed an efficient 
search heuristic that starts from the structure of the best explai- 
ning memorized network and then replaces local components 
where necessary. In experiments carried out with a real robot 
equipped with a 6-DOF manipulator as well as in simulation, we 
show that our system can quickly adapt to changes of the body 
physiology in full 3D space, in particular with limited visibility, 
noisy and partially missing observations, and without the need 
for proprioception. 

I. Introduction 

Autonomous robots deployed in real world environments 
have to deal with situations in which components change 
their behavior or properties over time. Such changes can for 
example come from deformations of robot parts or material 
fatigue. Additionally, to make proper use of tools, a robot 
should be able to incorporate the tool into its own body scheme 
and to adapt the gained knowledge in situations in which the 
tool is grabbed differently. Finally, components of the robot 
might get exchanged or replaced by newer parts that no longer 
comply with the models engineered originally. 

Kinematic models are widely used in practice, especially in 
the context of robotic manipulation [3, 4]. These models are 
generally derived analytically by an engineer [5] and usually 
rely heavily on prior knowledge about the robots' geometry 
and kinematic parameters. As robotic systems become more 
complex and versatile or are even delivered in a completely 
reconfigurable way, there is a growing demand for techniques 
allowing a robot to automatically learn body schemes with no 
or only minimal human intervention. 

1 Parts of this work will appear in the following conference proceedings. The 
initial work on bootstrapping, convergence behavior, prediction and control 
accuracy has been intoduced in [1] using a 2-DOF robotic manipulator. In 
a second paper [2], we extend the system towards life-long monitoring and 
efficient adapation, and present experiments with a 6-DOF manipulator. 




Fig. 1. Upper left: Our 6-DOF robotic manipulator arm learns and monitors 
its own body-scheme using an external monocular camera and visual markers. 
Upper right: After a different tool is placed in the robot's end-effector, the 
model predictions do not fit the current observations anymore. Bottom: The 
current body scheme linking action signals a$ and body parts Xj using local 
models Aj^j~. Here, a mismatch between the internal model and recent self- 
observation has been detected at A6^7. 



Clearly, such a capability would not only facilitate the de- 
ployment and calibration of new robotic systems but also allow 
for autonomous re-adaptation when the body scheme changes, 
e.g., through regular wear- and- tear over time. Furthermore, the 
ability to learn a body scheme is important in the context of 
tool use scenarios in which a robot has to identify the effects 
of its actions on the tool. 

In this paper, we investigate how to equip autonomous 
robots with the ability to learn and adapt their own body 
schemes and kinematic models using exploratory actions and 
self-perception only. We propose an approach to learn a 
Bayesian network for the robot's kinematic structure including 
the forward and inverse models relating action commands and 
body pose. More precisely, we start with a fully connected 
network containing all perceivable body parts and available ac- 
tion signals, perform random "motor babbling," and iteratively 
reduce the network complexity by analyzing the perceived 
body motion. At the same time, we learn non-parametric 
regression models for all dependencies in the network, which 
can later be used to predict the body pose when no perception 
is available or to allow for gradient-based posture control. 

One of the major advantages of the approach presented in 
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Fig. 2. Continued experiment from Figure 1. The robot samples a local 
model as replacement for the mismatching component A6^7. Left: The first 
newly sampled model (Ag^ 7 ) has high uncertainty, because of the missing 

dependency on action clq. Right: The second sampled model (Ag^ 7 ) is a 
more suitable replacement for the mismatching component. 



this paper is that it addresses all of the following practical 
problems that frequently arise in robotic manipulation tasks 
in a single framework: 

• Prediction: If both the structure and the CDFs of the 
Bayesian network are known, the robot is able to predict 
for a given action command the expected resulting body 
configuration. 

• Control: Conversely, given a target body pose, our ap- 
proach is able to generate appropriate action commands 
that will lead to this pose. 

• Model testing: Given both a prediction and an observati- 
on of the current body pose, the robot is able to estimate 
the accuracy of its own pose predictions. Model accuracy 
can, for example, be defined in terms of a distance metric 
or a likelihood function. 

• Learning: Given a sequence of action signals and the 
corresponding body postures, the Bayesian network and 
its parameters can be learned from the data. 

• Discovering the network structure: When the structure 
of the Bayesian network is unknown, the robot is able to 
build it from the available local models which are most 
consistent with the observed data. 

• Failure detection and model adaptation: When the ro- 
bot's physiology changes, e.g., when a joint gets blocked 
or is deformed, or a visual marker is changed, this is 
efficiently detected so that only the affected local models 
of the Bayesian network need to be replaced. 

II. Related Work 

The problem of learning kinematics of robots has been 
investigated heavily in the past. For example, Kolter and 
Ng [6] enable a quadruped robot to learn how to follow om- 



nidirectional paths using dimensionality reduction techniques 
and based on simulations. Their key idea is to use the simulator 
for identifying a suitable subspace for policies and then to 
learn with the real robot only in this low-dimensional space. 
A similar direction has been explored by Dearden et al. [7], 
who applied dimensionality reduction techniques to unveil the 
underlying structure of the body scheme. Similar to this work, 
their approach is formulated as a model selection problem 
between different Bayesian networks. Another instance of 
approaches based on dimensionality reduction is the work by 
Grimes et al. [8] who applied the principal component analysis 
(PCA) in conjunction with Gaussian process regression for 
learning walking gaits on a humanoid robot. 

Yoshikawa et al. [9] used Hebbian networks to discover the 
body scheme from self-occlusion or self- touching sensations. 
Later, [10] learned classifiers for body/non-body discrimina- 
tion from visual data. Other approaches used for example 
nearest- neighbor interpolation [11] or neural networks [12]. 
Recently, Ting et al. [13] developed a Bayesian parameter 
identification method for nonlinear dynamic systems, such as 
a robotic arm or a 7-DOF robotic head. 

The approach presented in this paper is also related to 
the problem of self-calibration which can be understood as 
a subproblem of body scheme learning. When the kinematic 
model is known up to some parameters, they can in certain 
cases be efficiently estimated by maximizing the likelihood of 
the model given the data [14]. Genetic algorithms have been 
used by Bongard et al. [15] for parameter optimization when 
no closed form is available. To a certain extend, such methods 
can also be used to calibrate a robot that is temporarily using 
a tool [16]. In contrast to the work presented here, such 
approaches require a parameterized kinematic model of the 
robot. 

To achieve continuous self -modeling, Bongard et al. [17] 
recently described a robotic system that continuously learns 
its own structure from actuation- sensation relationships. In 
three alternating phases (modeling, testing, prediction), their 
system generates new structure hypotheses using stochastic 
optimization, which are validated by generating actions and 
by analyzing the following sensory input. In a more general 
context, Bongard et al. [18] studied structure learning in 
arbitrary non-linear systems using similar mechanisms. 

In contrast to all the approaches described above, we 
propose an algorithm that both learns the structure as well 
as functional mappings for the individual building blocks. 
Furthermore, our model is able to revise its structure and 
component models on-the-fly. 

III. A Bayesian Framework for Robotic Body 
Schemes 

A robotic body scheme describes the relationship bet- 
ween available action signals (ai, . . . ,a m ), self-observations 
(Yi, . . . , Y n ), and the configurations of the robot's body parts 
(Xi, . . . , X n ). In our concrete scenario, in which we consider 
the body scheme of a robotic manipulator arm in conjunc- 
tion with a stationary, monocular camera, the action signals 




Fig. 3. Graphical model for two body parts Xi and Xj as well as their 
dependent variables. A denotes the set of independent action variables that 
cause a local transformation Ai_^- . Yi and Yj are the observed part locations, 
and Zi^j is their relative geometric transformation. 



ai G M are real- valued variables corresponding to the joint 
angles. Whereas the Xi G M 6 encode the 6-dimensional poses 
(3D Cartesian position and 3D Euler angles) of the body 
parts w.r.t. a reference coordinate frame, the Yi G R 6 are 
generally noisy and potentially missing observations of the 
body parts. Throughout this paper, we use capital letters to 
denote 6D pose variables to highlight that these also uniquely 
define homogeneous transformation matrices, which can be 
concatenated and inverted. Note that we do not assume direct 
feedback/proprioception telling the robot how well joint i has 
approached the requested target angle a*. 

Formally, we seek to learn the probability distribution 



p(X 1 ,...,X n ,Yi,...,Y n | ai,...,a m ), 



(1) 



which in this form is intractable for all but the simplest 
scenarios. To simplify the problem, it is typically assumed 
that each observation variable Yi is independent from all other 
variables given the true configuration Xi of the corresponding 
body part and that they can thus be fully characterized by an 
observation model p{Yi \ Xi). Furthermore, if the kinematic 
structure of the robot was known, a large number of pair- wise 
independencies between body parts and action signals could 
be assumed, which in turn would lead to the much simpler, 
factorized model 



p(X u ...,X n | ai,...,a m ) = 

[ppG | parents(Xi)) • p (parents (Xi) 



ai, 



(2) 

m) - 



Here, parents(Xi) denotes the set of locations of body parts, 
which are directly connected to body part i. 

The main idea behind this work is to make the factorized 
structure of the problem explicit by introducing (hidden) 



transformation variables A^_ 



X i Xj for all pairs of 



body parts {X^Xj) as well as their observed counterparts 
Zi^j := Y i ~ 1 Yj. Here, we use the 6D pose vectors X and 
Y as their equivalent homogeneous transformation matrices, 
which means that Ai_^- reflects the (deterministic) relative 
transformation between body parts Xi and Xj. Figure 3 
depicts a local model, which fully defines the relationship 
between any two body parts Xi and Xj and their dependent 
variables, if all other body parts are ignored. 




Fig. 4. In an early learning phase, the robot knows only little about its body 
structure, i.e., all possible local models need to be considered in parallel. From 
the subset of valid local models, a minimal spanning tree can be constructed 
which, in turn, forms a Bayesian network. This can subsequently be used as 
a body scheme for prediction and control. 



Since local models are easily invertible (Aj_^- are homoge- 
neous transformations), any set of n — 1 local models which 
form a spanning tree over all n body parts defines a model 
for the whole kinematic structure. 

A. Local Models 

The local kinematic models are the central concept in 
our body scheme framework. A local model M describes 
the geometric relationship p^(Z^j \ Ai^j) between two 
observed body parts Yi and Yj, given a subset of the action 
signal Ai^j C {a 1 ,...,a n }. 

The probability distribution underlying a local model can 
be defined in various ways. If an analytic model of the 
robot exists from its specifications, it can be used directly 
to construct pm(Z^j \ Ai^j). The standard way to describe 
a geometric model for robot manipulators is in terms of the 
Denavit-Hartenberg parameters [3, 19]. When available, the 
advantages of these models are outstanding: they are exact 
and efficient in evaluation. When the exact model is not 
available a-priori, it needs to be learned from data. On the 
real robotic platform used in our experiments, the actions a^ 
correspond to the target angle requested from joint i and the 
observations Yi are obtained by tracking visual markers in 
3D space including their 3D orientation [20] (see the top right 
image of Figure 1). Note that the >Vs are inherently noisy and 
that missing observations are common, for example in the case 
of (self-)occlusion. We learn the transformation parameters of 
the local models independently using the Gaussian process 
model for regression [21]. 

IV. Learning the Network Topology 

If no prior knowledge about the robot's body scheme exists, 
we initialize a fully connected network model (see Figure 4), 
resulting in a total set of ^£L (2) (T) ^ oca ^ m °dels. Given 
a set of self observations, the robot can determine the validity 
of the local models by evaluating the prediction error on a 
seperate test set. 

From the superset of all valid local models M va ud = 
{.Mi, . . .}, we seek to select the minimal subset M C M m ^ 
that covers all body part variables and simultaneously maximi- 
zes the overall model fit q(V \ M) := Umem ^ V I M )- Here > 
we use a model quality measure q(V | M) that is proportional 
to both the prediction accuracy and to a penalty term for the 



complexity of the model (see [2] for details). It turns out 
that M can be found efficiently by computing the minimal 
spanning tree of M va ud taking the model quality measure of 
the individual local models as the cost function. Such a body 
spanning tree needs to cover all body parts X\ , . . . , X n but 
not necessarily all action components of ai, . . . , a m . 

V. Experiments 

We tested our approach in a series of experiments, both on 
a real robot and in simulation. The goal of our experiments 
was to verify that 

1) model training and selection converges quickly to the 
correct kinematic chain, 

2) the resulting body scheme can be used for accurate 
prediction and control, 

3) changes in the robot's body (blocked joints / deformati- 
ons) at a later point are detected confidently , and 

4) the body scheme is updated automatically without hu- 
man intervention. 

The robot used to carry out the experiments is equipped 
with a 6-DOF manipulator composed of Schunk PowerCube 
modules. The total length of the manipulator is around 1.20m. 
With nominal noise values of (a 'joints = 0.02°), the reported 
joint positions of the encoders were considered to be suffi- 
ciently accurate to compute the ground truth positions of the 
body parts from the known geometrical properties of the robot. 
Visual perception was obtained by using a Sony DFW-SX900 
Fire Wire-camera at a resolution of 1280x960 pixels. On top 
of the robot's joints, 7 black-and-white markers were attached 
(see Figure 1), that were detectable by the ARToolkit vision 
module [20]. Per image, the system perceives the unfiltered 
6D poses of all detected markers. The standard deviation of 
the camera noise was measured to (J ma rkers = 44mm in 3D 
space, which is acceptable considering that the camera was 
located two meters apart from robot. 

In order for a local model to be valid, its translational and 
rotational error on the test set needed to be below a threshold 
of trans = 3a trans = 150mm and 6 rot = 3a ro t = 45°, with 
&trans and o rot as the standard deviation of the translational 
and rotational observation noise, respectively. The prediction 
accuracy of each local model was evaluated using a test set 



with size YD- 



testing \ 



5, and new local models were trained 



from a (distinct) set of up to \D- 



= 30 samples. 

A. Evaluation of Model Accuracy 

To quantitatively evaluate the accuracy of the kinematic 
models learned from scratch as well as the convergence be- 
havior of our learning approach, we generated random action 
sequences and analyzed the intermediate models using a 2- 
DOF robot of which the kinematic model is perfectly known. 

Figure 5 gives the absolute errors of prediction and control 
after certain numbers of observations have been processed. For 
a reference, we also give the average observation noise, i.e. 
the absolute localization errors of the visual markers. 

As can be seen from the diagram, the body scheme con- 
verges robustly within the first 10 observations. After about 
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Fig. 5. Prediction and control errors for a kinematic model that is learned 
from scratch. Already after 7 samples, the average prediction error is lower 
than the average localization error of the visual markers. 



15 training samples, the accuracy of the predicted body 
part positions even outperformed the accuracy of the direct 
observations. The latter is a remarkable result as it means that, 
although all local models are learned from noisy observations, 
the resulting model is able to blindly predict positions that 
are more accurate than immediate perception. The figure also 
gives the accuracy of the gradient-based control algorithm. 
Here, we used an additional marker for defining a target 
location for the robot's end effector. We learned the full body 
scheme model from scratch as in the previous experiment and 
used the gradient-based control algorithm to bring the end 
effector to the desired target location. The average positioning 
error is in the order of the perception noise (approx. 50mm, see 
Figure 5), i.e. slightly higher than the prediction error alone. 

B. Recovery after Joint stuck 

We generated a large sequence of random motor commands 
(ai, . . . , a m ). Before accepting a pose, we checked that the 
configuration would not cause any (self-)collisions, and that 
the markers of interest (X 6 and X 7 ) would potentially be 
visible on the camera image. This sequence was sent to the 
robot and after each motion command, the observed marker 
positions (Yi,...,F n ) were recorded. In the rare case of a 
anticipated or a real (self-)collision during execution, the robot 
stopped and the sample was rejected. Careful analysis of the 
recorded data revealed that, on average, the individual markers 
were visible only in 86.8% of the time with the initial body 
layout. In a second run, we blocked the robot's end-effector 
joint <24, such that it could not move, and again recorded a log- 
file. Note that we allow arbitrary 3D motion (just constrained 
by the geometry of the manipulator) and thus do not assume 
full visibility of the markers. 

An automated test procedure was then used to evaluate the 
performance and robustness of our approach. For each of the 
20 runs, a new data set was sampled from the recorded log- 
files, consisting of 4 blocks with N = 100 data samples each. 
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Fig. 6. At t = 100, a joint gets blocked, which causes the initial local 
model p engineered (^6^7 I ^4) to produce substantially larger prediction 
errors. At t = 126, the robot samples a new local model p/ earne d(A6 —> 7) 
as replacement. 



Fig. 7. The absolute prediction error of the combined kinematic model 
p(Z 1^7 | a± , . . . , a a) of our 6-DOF manipulator. This model is composed of 
6 individual local models of which one is replaced by a newly learned model 
at t = 126 (cmp. Figure 6). As can be seen from the plot, the prediction 
accuracy recovers quickly after each of the three external events. 



The first and the third block were sampled from the initial body 
shape, while the second and the fourth block were sampled 
from the log-file where the joint got blocked. 

Figure 6 shows the prediction error of the local models 
predicting the end-effector pose. As expected, the prediction 
error of the engineered local model increases significantly after 
the end-effector joint gets blocked at t = 100. After a few 
samples, the robot detects a mismatch in its internal model 
and starts to learn a new dynamic model (around t = 130), 
which quickly reaches the same accuracy as the original, 
engineered local model. At t = 200, the joint gets repaired 
(unblocked). Now the estimated error of the newly learned 
local model quickly increases while the estimated error of the 
engineered local model decreases rapidly towards its initial 
accuracy. Later, at t = 300, the joint gets blocked again in the 
same position, the accuracy of the previously learned local 
model increases significantly, and thus the robot can re-use 
this local model instead of having to learn a new one. 

The results for 20 reruns of this experiment are given in 
Figure 7. The hand-tuned initial geometrical model evaluates 
to an averaged error at the end-effector of approx. 37mm. 
After the joint gets blocked at t = 100, the error in prediction 
increases rapidly. After t = 115, a single new local models 
gets sampled, which already is enough to bring down the over- 
all error of the combined kinematic model to approximately 
51mm. Training of the new local model is completed at around 
t = 135. 

Later at t = 200, when the joint gets un-blocked, the error 
estimate of the combined kinematic model increases slightly, 
but returns much faster to its typical accuracy: switching back 
to an already known local model requires much fewer data 
samples than learning a new model (see Table I). At t = 300, 
the same quick adaption can be observed when the joint gets 
blocked again. 



C. Recovery from a Deformed limb 

In a second experiment 2 , we changed the end-effector limb 
length and orientation and applied the same evaluation proce- 
dure as in the previous subsection. This was accomplished by 
placing a tool with an attached marker in the robot's gripper 
at different locations (see Figure 1). 

Although the overall result closely resembles the case of a 
blocked joint, there are a few interesting differences. After the 
tool gets displaced at t = 100, on average two local models 
need to be sampled because the first one is not sufficient. 

Also note that it takes much more training samples for the 
GPs to learn and validate the underlying probability distribu- 
tion p(Zq^ 7 I (14) (see Table I). The prediction accuracy of 
the whole system closely resembles the levels as in the case 
of the blocked joint: On average, we measured after recovery 
an accuracy of 47mm. 

D. Controlling a Deformed Robot 

Finally, we ran a series of experiments to verify that dy- 
namically maintained body schemes can be used for accurate 

2 A demonstration video of this experiment can be found on the in- 
ternet at http://www.informatik.uni-freiburg.de/~sturm/ 
media/ re source s/public/zora-7dof- demo . avi 

TABLE I 

Evaluation of the recovery time required after being exposed 

to different types of failures. in each of the 4 x 20 runs, full 

recovery was after each event robustly achieved. 
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same failure 


91.9% 


Joint blocked 


16.50 


0.45 


0.65 






± 1.20 


± 0.86 
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79.0% 


Limb deformed 


20.20 
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12.10 
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± 0.83 


± 1.64 
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Fig. 8. The manipulator robot with a deformed limb has to follows the blue 
target trajectory. With a static body model, it suffers from strong derivation 
(red trajectory). By using our approach, the body scheme is dynamically 
adapted, and the trajectory is very well approached (green trajectory). 



positioning and control. The experiments were executed on a 
6-DOF manipulator in simulation. 

We defined a 3D trajectory consisting of 30 way-points 
that the manipulator should approach by inverse kinematics 
using its current body scheme, see Figure 8. When the initial 
geometric model was used to follow the trajectory by using 
the undamaged manipulator, a positioning accuracy of 7.03mm 
was measured. When the middle limb was deformed by 45°, 
the manipulator with a static body scheme was significantly 
off course, leading to an average positioning accuracy of 
189.35mm. With dynamic adaptation enabled, the precision 
settled at 15.24mm. This shows that dynamic model adaption 
enables a robot to maintain a high positioning accuracy after 
substantial changes to its body physiology. 



VI. Conclusion 

In this paper, we presented a novel approach to life-long 
body scheme adaptation for a robotic manipulation system. 
Our central idea is to continuously learn a large set of 
local kinematic models using non-parametric regression and to 
search for the best arrangement of these models to represent 
the full system. 

In experiments carried out with a real robot and in simu- 
lation, we demonstrated that our system is able to deal with 
missing and noisy observations, operates in full 3D space, and 
is able to perform relevant tasks like prediction, control, and 
online adaptation after failures. Challenging topics for further 
investigation include developing an active exploration strategy, 
learning from marker-less observations, point-like features, or 
range observations and learning for fully unobservable parts 
of the robot. 
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